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ABSTRACT 

The process of understanding spoken language requires 
the efficient processing of ambiguities that arise by the 
nature of speech. This paper presents an approach that al- 
lows the efficient incremental integration of speech recog- 
nition and language understanding using Tomita's gener- 
alized LR-parsing algorithm. For this purpose the GLR- 
lattice-parsing-algorithm [11] is revised so that an agenda 
mechanism can be used to control the flow of compu- 
tation of the parsing process. Subsequently the HMM- 
evaluations of the word models are combined with a sto- 
chastical language model to do a beam search similar to 
[2, 1, 12], where chartparsers are used to do the job. 



1. INTRODUCTION 

In [10] M. Tomita proposes a parsing algorithm (Gener- 
alized LR-Parsing, GLRP) and extends it in [11] to an 
algorithm that can parse whole word lattices. This al- 
gorithm often works more efficiently with grammars for 
natural languages than others (see [10, 7]). 

Nevertheless the lattice-GLRP is not very flexible and re- 
quires the parse of the whole lattice in a certain order. 
Therefore it remains impossible to use it in real size appli- 
cations that must handle 500 and more word hypotheses 
in each word lattice — in spite of its efficiency. 

It is generally acknowledged that the problem regarding 
the size of the word lattices can only be solved by using 
heuristics that can guide the parsing process. Typically, 
the following two models are combined: an acoustic model 
that represents the probability that a certain word was ut- 
tered during a time interval and a language model, e.g. a 
probabilistic regular grammar that scores word sequences. 

In section 2 a revised version of the original GLRP is 
presented. This new algorithm consists of three basic ac- 
tions that act on the core data structure of the GLRP, 
the graph structured stack. They are designed in such 
a way that their instances may be processed in any ran- 
dom order, which makes it possible to put them into a 
control data structure (agenda) and work them down ac- 
cording to a certain strategy. This means a new quality of 
control over the order of processing of a parser that uses 
LR-parsing tables. 

Also, by this way it will be possible in the sections follow- 
ing to combine the basic actions with a heuristic scoring 
function. This combination will allow to guide the search 



through the lattice, in order to find the word sequence 
that has the best evaluation and its syntactic derivation 
very fast. 



2. THE REVISED GLRP 



First the crucial data structures are defined to be either 
sets of some kind or pascal-like records: 

• VERTEX (Time, State, LinkSet): A Vertex represents 
a left context. It can be referenced by its time and the 
sir-table state it represents. Furthermore, it has got a 
set of links. 

• Link (Node, PS): A Link is a reference to a node in the 
ParseForest that also connects a vertex (^4) to a set of 
predecessor vertices (PS). The sir-table-lookup of the 
state of these vertices PS together with the category of 
Node yields the state of vertex A. 

• NODE (Cat, Start, End, Hypos) or (Cat, Start, End, 
SubtrSeqs): It can be uniquely identified by the triple 
(Cat, Start, End). It either references a set of word 
hypotheses - then the category is a terminal - or a set 
of sequences of subtrees - then Cat is nonterminal. 

• Graph- Structured Stack, GSS: The GSS is a set 
of links and vertices. 

• Set OF Nodes, ParseForest: The ParseForest repre- 
sents all possible derivations. 

• Set OF ACTIONS, Agenda: All actions are placed on the 
Agenda and are carried out according to some strategy. 

• Set OF NewHypos, OldHypo Actions that have already 
been executed. 

The new approach does not only lead to a more flexible al- 
gorithm it also divides the work that has to be done by the 
GLRP more concisely into the three main mechanisms: 

1. Shift: construct a new element in the GSS — if it does 
not already exist 

2. Search: initiate new Shifts with non-terminal categories 

3. NewHypo: initiate new Shifts with terminal categories 
The basic actions are: 



SW\ft(Vertex, Node, Time, State): 



NewHypo(H): H = (Start, End, Key) 



1. if 3 Vi G GSS, st. Vi = (Ti, S t , LinkSeti), T t = Time 
and Si = State, then 

if 3L 3 G LinkSett, st. L 3 = (N 3 , PS 3 ) and N 3 = Node, 
then 2. else 3. 

else 4. 

2. • if Vertex G PSj then return 

• add Vertex to PSj 

• assume Vertex = (., Statei, LinkSeti) 

• for all previous Search (Rule nr , SubtrSeq, Link, End- 
ingTime), st. Link = L 3 do 

if \SubtrSeq\ = \right-side(Rule„ r )\ then 

add Sh\ft(Vertex, N m , EndingTime, 

slr-table^fafei, HeadCat (Rule nr ) ) ) 
to Agenda, where N m is the node that was cre- 
ated by the shift action which was initiated by 
this previous search 

else 

for all links Lk = (Nk,.) G LinkSeti do 

add Search (Rule nr , cons(Nk, SubtrSeq) , Lk, 
EndingTime) to Agenda 

• return 

3. • create link L 3 = (Node, {Vertex}) 

• for all actions Search (Rule nr , SubtrSeq, 
Link), st. Link = (.,PS), V t G PS and 
\SubtrSeq\ < |right-side (Rule nr )\, do 

add Search (Rule nr , cons(Node, SubtrSeq) , L 3 ) to 
Agenda 

• return 

4. • create link L t = (N ode, {Vertex} ); 

create V t = (Time, State, {L t }); 
add L t and V t to GSS 

• for all NewHypo^if,) G OldHypo Actions, where 
H = (ST, ET, K), ST = Time, 

for all categories C 3 that are possible according to 
the lexical Key K do 

if 3Sj, st. (Shift Sj) G slr-tablefState, C 3 ) then 

add Shift^V;, N 3 , ET, Sj) to Agenda, where 
N 3 = (C 3 , ST, ET,.) 

• SentinelLink = ({}, { Vi}) 

• for all categories C 3 , 

for all (Reduce Rule„ r ) G slr-table^V 8 , C 3 ) do 

add Search (Rule nr , (), SentinelLink, Time) to 
Agenda 

• return 



• for all categories C 3 that are possible according to the 
lexical key Key do 

if 3Nk = (Gk, Sk, Ek, Hsk) G ParseForest, such that 
Gk = Cj, Sk = Start and Ek = End, then 

* add H to Hsk 
else 

* create a node N 3 = (Cj, Start, End, {H}) in the 
ParseForest 

* for all vertices V t G GSS, such that V t = (Timet, 
Statei,.), Time % = Start, 3NewState and 

(Shift NewState) G slr-table^tate;, Cj), do 

add action Shift^V 8 , N 3 , End, NewState) to the 
Agenda 

• store NewHypo^if,) in OldHypoActions 

• return 



Search (Rule nr , SubtrSeq, Link, EndingTime): 

1. if \SubtrSeq\ = \right-side(Rule„ r )\ then 2. else 3. 

2. if 3N G ParseForest, st. N = (d, ST, ET, StS t ), 
HeadCat (Rule nr ) = C t , ST = Tk, Vk = (Tk,.,.), 
Vk G PSi, Link = (., PSi), ET t = EndingTime then 

• add SubtrSeq to StS t ; return 
else 

• create a node N t = (^HeadCat (Rule nr ), Tk, 

EndingTime, {SubtrSeq} ) in the ParseForest 

• for all vertices V m G PS, where Link = (N, PS), 
V m = (., Sm,.), C t = HeadCat (Rule nr ) do 

add Shift^Vm, N t , EndingTime, 

slr-table^m, C t ) ) to Agenda 

• return 

3. .for all Vt G PS, st. Link = (., PS), V, = (.,., LS,), 

and for all L 3 G LS t , where L 3 = (N 3l .) do 

add Search (Rule nr , cons(N 3 , SubtrSeq) , L 3 , 
EndingTime) to Agenda 

• return 

The main routine is quite simple: 

1. vertex V = (0, 0, 0), GSS = { V }. 

2. initialize the Agenda with one NewHypo action for each 
word hypothesis of the lattice 

3. until there are no more actions on the Agenda do 

take one action from the Agenda and carry it out 



lo § P normal(™ I N-Gram, HMM) 



X - l °Z P normal( w I N-Gram) 
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log^norm a /(^l HMM ) 
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log P norma i(w | N-Gram, HMM), if length(w) > 
0, if length(w) = 
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( denormalize(log P outstde (l)) + 
denormalize(log P inside (L)) + 
A • log P(± irst_word(L)| 
y last_word(Z), Bigram) 



(3) 



Besides, e-productions need not be handled seperately 
anymore. Therefore the Common Lisp code of an im- 
plementation of the revised GLRP has shrunk about 15% 
compared with an implementation of the original GLRP. 

An implementation of the beam search agenda GLRP is 
available by FTP from faui80.informatik.uni-erlangen.de, 
"/pub/lisp/parser/glr-lattice-parser.tar.gz". Or send e- 
mail. 



WORST CASE BEHAVIOUR OF THE 
REVISED GLRP 



To ensure the comparability of different actions the prob- 
abilities of the bigram model and the word model are nor- 
malized. The normalization of the bigram model entails a 
division by the number of operations that have been ap- 
plied, while the word model probability is divided by the 
number of time units the word spans (nevertheless, this 
is an ad-hoc normalization that must be improved in the 
long run): 



log P norma i(w\Bigram) = 

log P„ormal(w\H M M) 



log P(w\Bigram) 
#(BigramOperation) 

log P(w\H MM) 
^(Frames) 



The flexibility of the revised GLRP should not incur heavy 
costs on the runtime behaviour. The crucial places to 
look for a decrease in performance compared to Tomita's 
GLRP are the steps (for-loops and existential conditions) 
where some instances (vertices, previous search actions, 
etc.) must be retrieved. However, each of these steps 
either has an equivalent action in Tomita's GLRP or it 
can be retrieved trivially by some additional information 
that must be added to the Graph-Structured Stack (For 
a thorough description of the new data structure please 
refer to [9]). Therefore the worst case behaviour of the 
revised GLRP is of the same order as Tomita's. 



4. BEAM SEARCH 

In order to combine the GLRP with heuristic scores it is 
necessary to define the scoring function and bring argu- 
ments why it was chosen. 



4.1. A Metric For The Beam Search 



At least (see also [12]) the following design criteria should 
be met for the metric: 

1. It should combine a bigram model with the acoustic 
score of the HMM word models. 

2. Evaluations of different actions should be comparable. 

3. The complete left context information should be con- 
sidered. 



Since both schemes are often not drawn from the same 
test sample and are seldom really equally important, it is 
useful to combine them with an adjustment parameter A 
(eq. (1)). The value of this parameter A can be found by 
experiments or an optimization procedure. 

For the purpose of encorporating the left context proba- 
bility the evaluations are partitioned in inside evaluations 
and outside evaluations. 

Furthermore, they are defined on the corresponding links 
instead of the respective word sequences, because the 
same word sequence may be a continuation of different 
left contexts and the respective probabilities under these 
different conditions may vary. I.e. that different links may 
have the same or different nodes that cover the same word 
sequence. The left context of a link can easily be identified 
by its set of predecessor vertices. 

The inside evaluation of a link L with a node covering the 
word sequence w is given by equation (2). The outside 
evaluation of a link L with predecessor vertex K and a 
bigram model is defined recursively by eq. (3). If K is 
the start vertex, then in eq. (3) "log P 0U igid e ()Y must be 
substituted by "0" and the "last_word(7/' by "*BEGIN- 
MARKER*". 

The function denormalize causes an extraction of the 
acoustic and the n-gram score and undoes the normal- 
ization over the respective length. This is necessary, 
since the normalized scores can not be combined directly, 
normalize is the corresponding inverse function. 

If a link has a node that spans several word sequences a 
maximization over all sequences must be done, since the 
best analysis is wanted. 



4.2. Integrating The Beam Search Into The Re- 
vised GLRP 

In the implemented version the Shift actions are scored 
according to the outside evaluation of the new link. Of 
course, also the other two types of action could be scored 
and worked down according to their scores. But for rea- 
sons of simplicity and since most Search actions only act 
along paths with good evaluations — almost these alone 
are constructed by the Shift actions — the Search actions 
and the NewHypo actions are handled with a stack that 
has a higher priority than the Shift actions. 

The beam search strategy itself consists out of two stages: 

1. The algorithm works through the lattice time incremen- 
tally. Thereby it evaluates all possible actions during each 
time frame, but processes only those the score of which is 
within a beam around the best current score. All other 
actions are saved onto the "PrunedAgenda" . 

2. If a parse could not be found during stage 1 the actions 
that were saved onto the "PrunedAgenda" are processed 
with a best first search. 



5. EXPERIMENTS 

Tests were carried out with a sir-table of a 1560 rules 
CFG. Ten word lattices where word hypotheses families 
were already reduced to single word hypotheses and those 
single hypotheses numbered between 56 and 202 were 
parsed with the described beam search strategy. In com- 
bination with the acoustic scores from the decoder a bi- 
gram model with rather high perplexity (ftj 52) was used 
to guide the search. Under these conditions 80% of the 
lattices could be handled successfully and with one ex- 
ception the parsing of the recognized structures took less 
than 4 seconds. 



6. RELATED WORK 

There have been various systems that employ stochastic 
control, different strategies and — more rarely — GLR- 
Parsing techniques. E.g.: Shikano [8] shows how to use 
n-gram models. Paseler et. al. [3] published a beam search 
method. This method is combined with a n-gram model 
by Paseler and Ney [2]. Wrigley and Wright, e.g. [13], 
use a probabilistic CFG as their language model. In [6] 
best first search is demonstrated by T. Seneff. In H. Ney 
[1] sets of phoneme hypotheses are analysed with a beam 
search strategy and the use of n-gram models in this con- 
text is explained. L. Schmid uses the A*-algorithm to do 
a best first search over the number of word hypotheses in 
[5]. Etc. 

However, none of them presents a general approach to 
guide the GLR-parsing process with stochastic informa- 
tion and especially to combine GLR-parsing with a beam 
search strategy. 



7. CONCLUSION 

In this paper a revised version of Tomita's GLR-parsing 
algorithm is described that allows the flexible use of 



strategies. It is combined with a beam search strategy 
to parse word lattices and return a packed forest repre- 
sentation of a number of parse trees with good scores. 
While theoretical considerations show that the worst case 
behaviour of the revised algorithm is of the same order 
as Tomita's original algorithm the experiments demon- 
strate that the new algorithm might be used in nontrivial 
applications. 

The experiments also indicate that GLRP itself is useful. 
E.g. Schabes [4] argues that GLRP was heavily handi- 
capped, because the number of sir-table states could be 
exponential to the number of CFG rules. However, the 
1560 rules CFG that is used in the experiments generates 
only ca. 4500 states. This supports Tomita's claim, that 
grammars for natural languages were not too densely am- 
biguous and therefore GLRP was appropriate for natural 
language parsing. 
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